Canadian Birth Probability AnalysisΒΆ
By Nicole Bidwell
IntroductionΒΆ
This analysis explores the chance of being born in Canada in a particular year. A description of how to reproduce the analysis by running the data pipeline can be found in the README.md file. The outline below serves as summary, explanation, and interpretation of the analysis.
DataΒΆ
The data used for this analysis is the birth rate and population data obtained through the World Bank's API.
The Chosen YearsΒΆ
This analysis explores the years 2010 to 2023. An example calculation for the probability of being born in Canada for year 2012 is included, along with the changes in probability over time across all countries.
Retrieving and Loading the DataΒΆ
The script retrieve_load.py found in the src is used to obtain the required data from the World Bank API in a JSON format. This includes pulling data from 2010 to 2023 across multiple pages for both birth rate and population, along with using sqlite3 to create a table, load the data into the database, and querying for the required subset of the data.
When querying for the required subset of data it is important to filter for valid countries since the original data includes countries grouped in specific regions. Including these regions in the processed data would have resulted over counting in the later calculations of total births.
The Processed DataΒΆ
After querying for the required data a pandas dataframe is created which obtains the country information (ISO3 code, id, and name), year, birth rate, and population. The data is saved as a csv file, country_br_pop.csv, in the data folder for later usage.
Calculated ValuesΒΆ
The script calculate_probabilities.py in the src folder is used to perform the probability calculations.
Number of BirthsΒΆ
After loading the country_br_pop.csv data, a column birth is added to the data frame. This provides the number of births in each year for each country, using the formula:
$$\text{Number of Births} = \frac{Birth Rate}{1000}\times\text{Population}$$
These values are used in the following calculations.
Probability of Being Born in Canada for 2012ΒΆ
To calculate the probability of being born in Canada for a specified year I created the function calc_probability_country. This function calculates the percentage probability of being born in any specified country for any specified year within the dataset. The two formulas used are:
$$\text{Total Worldwide Births in the Year} = \text{sum of all countries' births in the year}$$
$$\text{Percentage Probability for a Country} = \frac{\text{Country's Number of Births in the Year}}{\text{Total Worldwide Births in the Year}}\times 100$$
For calculating the probability of being born in Canada for 2012, the function is called with Canada for the country parameter and 2012 for the year parameter. For more a more tangible interpretation, I included the equivalent ratio using the formula:
$$\text{Ratio Value} = \frac{1}{\text{Percentage Probability}}\times100$$
These values are saved in the output folder.
Probability of Being Born in any Specified Country for any Specified YearΒΆ
The calc_probability_country functions was also used to calculate the probabilites of being born in all other countries in the dataset for each year. These values are saved in the csv file, countries_prob.csv, in the data folder.
Global Average Number of Births per YearΒΆ
The last value I calculated was the global average number of births per year. Obtained using the following formula, this value later provides insight when interpreting the differences in birth probabilities from year to year.
$$\text{Global Average Number of Births per Year} = \frac{\sum{\text{(Total Births per Year)}}}{\text{Total Number of Years}}$$
Results and InterpretationΒΆ
Probability of Being Born in Canada in 2012ΒΆ
The probability of being born in Canada in 2012 is $0.261\%$, which is indicates that, on average, 1 out of 383 people born in 2012 were was born in Canada.
Data Visualization and InterpretationΒΆ
The script graphs.py in the src folder is used to generate plots using Plotly Graph Objects and Plotly Express, which are later saved in the output folder. These plots allow for easier interpretation and deeper analysis into the birth probabilities.
Canada Bar Chart for 2012ΒΆ
This plot displays the probability of being born in Canada in 2012. It provides a straightforward visual comparison between the probability of being born in Canada and elsewhere in 2012.
Here we see the probability of being born in Canada in 2012 appears to be small. When hovering over the bars we can confirm the exact values. While this value appears small further analysis provides additional insight.
Canada Trend Line Over TimeΒΆ
This plot displays the change in probability of being born in Canada from 2010 to 2023.
Here we see the 2012 probability of $0.261\%$ is a minimum value over the period from 2010 to 2023. Notably, the maximum probability is $0.278\%$ which occured in 2021. This provides the range of $0.071\%$. A difference that may seem small, but when we consider the global average number of births per year, calculated to be roughly $140,039,787.23$, a $0.071\%$ difference means roughly $99428.24$ (calculated from $0.071/100\times140,039,787.23)$ more people were born in 2023 compared to 2012.
Top 5, Bottom 5, and Canada Time LineΒΆ
Similar to the Canada Trend Line Over Time, this plot includes additional countries' probability trend lines between 2012 to 2013. The included countries on the plot are the 5 countries with the highest average probability (India, China, Nigeria, Pakistan and Indonesia) and the 5 countries with the lowest average probability (Nauru, British Virgin Islands, San Marino, Tuvalu, and Palau), along with Canada for comparison.
The values in the legend can be clicked to better display overlapping trend lines.
While Canada is not one of the one of the countries with the lowest birth probabilities it remains closer to the bottom then the top. From this graph it is also evident that many of the countries appear to have relatively stable birth rate probabilities over time, except for China and India. In China we see a downwards trend following 2017. In India we see a slight downwards trend between 2010 to 2014, followed by more stability in the onwards years.
ConclusionsΒΆ
This analysis discouvered the probability of being born in Canada in 2012 is $0.261\%$ (or 1 in 383). We also dived deeper to gain insight into the changes in birth probabilites over time period of 2010 to 2023. While interpreting the trend lines we can see the biggest change in birth probabilites in Canada is between 2012 and 2021. This changed equated to roughly $99428.24$ more people being born in 2021 compared to 2012. That said, Canada's birth probabilities remained relatively stable compared to countries like China and India, which had the highest birth probabilities but also displayed more fluctuation.